Adversarial Learning for Chinese NER from Crowd Annotations

نویسندگان

  • YaoSheng Yang
  • Meishan Zhang
  • Wenliang Chen
  • Wei Zhang
  • Haofen Wang
  • Min Zhang
چکیده

To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotatorgeneric and -specific information. The annotator-generic information is the common knowledge for entities easily mastered by the crowd. Finally, we build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we create two data sets for Chinese NER tasks from two domains. The experimental results show that our system achieves better scores than strong baseline systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sembler: Ensembling Crowd Sequential Labeling for Improved Quality

Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alter...

متن کامل

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

Crowd Counting With Minimal Data Using Generative Adversarial Networks For Multiple Target Regression

In this work, we use a generative adversarial network (GAN) to train crowd counting networks using minimal data. We describe how GAN objectives can be modified to allow for the use of unlabeled data to benefit inference training in semi-supervised learning. More generally, we explain how these same methods can be used in more generic multiple regression target semi-supervised learning, with cro...

متن کامل

Harnessing Diversity in Crowds and Machines for Better NER Performance

Over the last years, information extraction tools have gained a great popularity and brought significant performance improvement in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER tools are still prone to brittleness due to th...

متن کامل

Clickstream analysis for crowd-based object segmentation with confidence

With the rapidly increasing interest in machine learning based solutions for automatic image annotation, the availability of reference annotations for algorithm training is one of the major bottlenecks in the field. Crowdsourcing has evolved as a valuable option for low-cost and large-scale data annotation; however, quality control remains a major issue which needs to be addressed. To our knowl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.05147  شماره 

صفحات  -

تاریخ انتشار 2018